5 research outputs found
Explaining Image Classifiers with Multiscale Directional Image Representation
Image classifiers are known to be difficult to interpret and therefore
require explanation methods to understand their decisions. We present
ShearletX, a novel mask explanation method for image classifiers based on the
shearlet transform -- a multiscale directional image representation. Current
mask explanation methods are regularized by smoothness constraints that protect
against undesirable fine-grained explanation artifacts. However, the smoothness
of a mask limits its ability to separate fine-detail patterns, that are
relevant for the classifier, from nearby nuisance patterns, that do not affect
the classifier. ShearletX solves this problem by avoiding smoothness
regularization all together, replacing it by shearlet sparsity constraints. The
resulting explanations consist of a few edges, textures, and smooth parts of
the original image, that are the most relevant for the decision of the
classifier. To support our method, we propose a mathematical definition for
explanation artifacts and an information theoretic score to evaluate the
quality of mask explanations. We demonstrate the superiority of ShearletX over
previous mask based explanation methods using these new metrics, and present
exemplary situations where separating fine-detail patterns allows explaining
phenomena that were not explainable before
SuperHF: Supervised Iterative Learning from Human Feedback
While large language models demonstrate remarkable capabilities, they often
present challenges in terms of safety, alignment with human values, and
stability during training. Here, we focus on two prevalent methods used to
align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning
from Human Feedback (RLHF). SFT is simple and robust, powering a host of
open-source models, while RLHF is a more sophisticated method used in top-tier
models like ChatGPT but also suffers from instability and susceptibility to
reward hacking. We propose a novel approach, Supervised Iterative Learning from
Human Feedback (SuperHF), which seeks to leverage the strengths of both
methods. Our hypothesis is two-fold: that the reward model used in RLHF is
critical for efficient data use and model generalization and that the use of
Proximal Policy Optimization (PPO) in RLHF may not be necessary and could
contribute to instability issues. SuperHF replaces PPO with a simple supervised
loss and a Kullback-Leibler (KL) divergence prior. It creates its own training
data by repeatedly sampling a batch of model outputs and filtering them through
the reward model in an online learning regime. We then break down the reward
optimization problem into three components: robustly optimizing the training
rewards themselves, preventing reward hacking-exploitation of the reward model
that degrades model performance-as measured by a novel METEOR similarity
metric, and maintaining good performance on downstream evaluations. Our
experimental results show SuperHF exceeds PPO-based RLHF on the training
objective, easily and favorably trades off high reward with low reward hacking,
improves downstream calibration, and performs the same on our GPT-4 based
qualitative evaluation scheme all the while being significantly simpler to
implement, highlighting SuperHF's potential as a competitive language model
alignment technique.Comment: Accepted to the Socially Responsible Language Modelling Research
(SoLaR) workshop at NeurIPS 202
Negative body experience in women with early childhood trauma: associations with trauma severity and dissociation
Background: A crucial but often overlooked impact of early life exposure to trauma is its far-reaching effect on a person’s relationship with their body. Several domains of body experience may be negatively influenced or damaged as a result of early childhood trauma. Objective: The aim of this study was to investigate disturbances in three domains of body experience: body attitude, body satisfaction, and body awareness. Furthermore, associations between domains of body experience and severity of trauma symptoms as well as frequency of dissociation were evaluated. Method: Body attitude was measured with the Dresden Body Image Questionnaire, body satisfaction with the Body Cathexis Scale, and body awareness with the Somatic Awareness Questionnaire in 50 female patients with complex trauma and compared with scores in a non-clinical female sample (n = 216). Patients in the clinical sample also filled out the Davidson Trauma Scale and the Dissociation Experience Scale. Results: In all measured domains, body experience was severely affected in patients with early childhood trauma. Compared with scores in the non-clinical group, effect sizes in Cohen’s d were 2.7 for body attitude, 1.7 for body satisfaction, and 0.8 for body awareness. Associations between domains of body experience and severity of trauma symptoms were low, as were the associations with frequency of dissociative symptoms. Conclusions: Early childhood trauma in women is associated with impairments in self-reported body experience that warrant careful assessment in the treatment of women with psychiatric disorders